A Formal Lexicon in the Meaning-Text Theory (or How to Do Lexica with Words)
نویسندگان
چکیده
The goal of this paper is to present a particular type of lexicon, elaborated within a formal theory of natural language called Meaning-Text Theory (MTT). This theory puts strong emphasis on the development of highly structured lexica. Computational linguistics does of course recognize the importance of the lexicon in language processing. However, MTT probably goes further in this direction than various well-known approaches within computational linguistics; it assigns to the lexicon a central place, so that the rest of linguistic description is supposed to pivot around the lexicon. It is in this spirit that MTT views the model of natural language: the Meaning-Text Model, or MTM. It is believed that a very rich lexicon presenting individual information about iexemes in a consistent and detailed way facilitates the general task of computational linguistics by dividing it into two more or less autonomous subtasks: a linguistic and a computational one. The MTM lexicon, embodying a vast amount of linguistic information, can be used in different computational applications. We will present here a short outline of the lexicon in question as well as of its interaction with other components of the MTM, with special attention to computational implications of the Meaning-Text Theory. The goal of the present paper is twofold: 1) To present a specific viewpoint on the role of lexica in "intelligent" systems designed to process texts in natural language and based on access to meaning. 2) To present a specific format for such a lexicon-so-called Explanatory Combinatorial Dictionary (ECD). We believe that a rich enough lexicon, which could enable us to solve the major problem of computational linguistics-that of presenting all necessary information about natural language in compact form, should be anchored in a formal and comprehensive theory of language. The lexicon to be discussed, that is ECD, has been conceived and developed within the framework of a particular linguistic theory-more specifically, Meaning Text Theory or MTT (Mel'6uk 1974, 1981, 1988:43-101). Note that this is by no means a theory of how linguistic knowledge could or should be applied in the context of any computational task. The MTT is a theory of how to describe and formally present linguistic knowledge, a theory of linguistic description; therefore, its contribution to computational linguistics is only a partial one: to take care exclusively of the linguistic part of the general endeavor. We cannot present here the Meaning-Text Theory in detail, so we will limit …
منابع مشابه
Verbal Polysemy in Automatic Annotation
The linguistic theory of Applicative and Cognitive Grammar analyses the language in three levels as follows: the linguistic level, the predicative level, and the semanticocognitive level. The meaning of the words is described at the semanticocognitive level. Here we give the description of the verbal semantics in order to build a lemmatized lexicon of polysemic words. On the one hand, the conte...
متن کاملRedundancy and productivity in the speech technology lexicon - can we do better?
Current lexica for speech technology typically contain much redundancy, while omitting useful information. A comparison with lexica in other media and for other purposes is instructive, as it highlights some features we may borrow for text-to-speech and speech recognition lexica. We describe some aspects of the new lexicon we are producing, Combilex, whose structure and implementation is specif...
متن کاملTowards an explanatory-combinatorial dictionary of Japanese
The Meaning-Text Theory is a formal, dependency-based linguistic theory in which the lexicon plays a central role. Within this theory, the vocabulary of a language is described by an Explanatory-Combinatorial Dictionary [ECD]. Unlike most existing dictionaries, an ECD is oriented towards language generation rather than analysis. This orientation makes it especially useful for language learners ...
متن کاملDesign and Implementation of a Software System for Detecting Orthographical or Morphological Errors in Persian Words
This paper presents a new method for analyzing words in the Persian language context to find orthographical and structural errors regardless of the meaning. This technique tokenizes each word in a statement then tries to detect the kind of word, and analyses its correctness in terms of orthography and morphology by means of a lexicon. It should be noted that some words in the Persian language h...
متن کاملCorpus-based Sinhala Lexicon
Lexicon is in important resource in any kind of language processing application. Corpus-based lexica have several advantages over other traditional approaches. The lexicon developed for Sinhala was based on the text obtained from a corpus of 10 million words drawn from diverse genres. The words extracted from the corpus have been labeled with parts of speech categories defined according to a no...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Linguistics
دوره 13 شماره
صفحات -
تاریخ انتشار 1987